有条件图像生成的最新方法受益于密集的监督,例如分割标签图,以实现高保真性。但是,很少探索使用密集的监督进行无条件的图像生成。在这里,我们探讨了密集监督在无条件生成中的功效,找到生成器特征图可以替代成本昂贵的语义标签图。从我们的经验证据来看,我们提出了一种新的生成器引导的鉴别剂正则化(GGDR),其中生成器的特征地图监督了歧视者在无条件生成中具有丰富的语义表示。具体而言,我们采用了一个U-NET架构进行鉴别器,该体系结构经过训练,可以预测发电机特征图作为输入的伪造图像。关于Mulitple数据集的广泛实验表明,我们的GGDR始终在定量和定性方面提高基线方法的性能。代码可从https://github.com/naver-ai/ggdr获得
translated by 谷歌翻译
随着各个领域的深度学习的巨大成功,图形神经网络(GNNS)也成为图形分类的主要方法。通过全局读出操作,只会聚合所有节点(或节点群集)表示,现有的GNN分类器获得输入图的图级表示,并使用表示来预测其类标签。但是,这种全局聚合不考虑每个节点的结构信息,这导致全局结构的信息丢失。特别地,它通过对所有节点表示来强制执行分类器的相同权重参数来限制辨别力;在实践中,他们中的每一个都有助于不同于其结构语义的目标类别。在这项工作中,我们提出了结构性语义读数(SSREAD)来总结位置级节点表示,这允许为分类模拟特定位置的权重参数,以及有效地捕获与全局结构相关的图形语义。给定输入图,SSREAD旨在通过使用其节点与结构原型之间的语义对齐来识别结构上有意义的位置,该结构原型编码每个位置的原型特征。结构原型经过优化,以最小化所有训练图的对准成本,而其他GNN参数训练以预测类标签。我们的实验结果表明,SSREAD显着提高了GNN分类器的分类性能和可解释性,同时兼容各种聚合函数,GNN架构和学习框架。
translated by 谷歌翻译
FSS(Few-shot segmentation)~aims to segment a target class with a small number of labeled images (support Set). To extract information relevant to target class, a dominant approach in best performing FSS baselines removes background features using support mask. We observe that this support mask presents an information bottleneck in several challenging FSS cases e.g., for small targets and/or inaccurate target boundaries. To this end, we present a novel method (MSI), which maximizes the support-set information by exploiting two complementary source of features in generating super correlation maps. We validate the effectiveness of our approach by instantiating it into three recent and strong FSS baselines. Experimental results on several publicly available FSS benchmarks show that our proposed method consistently improves the performance by visible margins and allows faster convergence. Our codes and models will be publicly released.
translated by 谷歌翻译
For low-level computer vision and image processing ML tasks, training on large datasets is critical for generalization. However, the standard practice of relying on real-world images primarily from the Internet comes with image quality, scalability, and privacy issues, especially in commercial contexts. To address this, we have developed a procedural synthetic data generation pipeline and dataset tailored to low-level vision tasks. Our Unreal engine-based synthetic data pipeline populates large scenes algorithmically with a combination of random 3D objects, materials, and geometric transformations. Then, we calibrate the camera noise profiles to synthesize the noisy images. From this pipeline, we generated a fully synthetic image denoising dataset (FSID) which consists of 175,000 noisy/clean image pairs. We then trained and validated a CNN-based denoising model, and demonstrated that the model trained on this synthetic data alone can achieve competitive denoising results when evaluated on real-world noisy images captured with smartphone cameras.
translated by 谷歌翻译
Meta-training, which fine-tunes the language model (LM) on various downstream tasks by maximizing the likelihood of the target label given the task instruction and input instance, has improved the zero-shot task generalization performance. However, meta-trained LMs still struggle to generalize to challenging tasks containing novel labels unseen during meta-training. In this paper, we propose Flipped Learning, an alternative method of meta-training which trains the LM to generate the task instruction given the input instance and label. During inference, the LM trained with Flipped Learning, referred to as Flipped, selects the label option that is most likely to generate the task instruction. On 14 tasks of the BIG-bench benchmark, the 11B-sized Flipped outperforms zero-shot T0-11B and even a 16 times larger 3-shot GPT-3 (175B) on average by 8.4% and 9.7% points, respectively. Flipped gives particularly large improvements on tasks with unseen labels, outperforming T0-11B by up to +20% average F1 score. This indicates that the strong task generalization of Flipped comes from improved generalization to novel labels. We release our code at https://github.com/seonghyeonye/Flipped-Learning.
translated by 谷歌翻译
先前的工作表明,语言模型(LMS)的大小(LMS)与它们在不同下游NLP任务上的零拍摄性能之间存在缩放定律。在这项工作中,我们表明,在用否定提示的任务评估大型LM时,这种现象并不存在,而是显示了逆缩放定律。我们对(1)验证的LMS(OPT&GPT -3)的否定提示评估了9个不同的任务,该任务的不同尺寸(125m -175b),(2)LMS进一步预处理以推广到新颖的提示(指令),(3)提供的LMS,(3)LMS。示例很少,(4)LMS专门针对否定的提示进行了微调;所有LM类型在否定的提示上的表现较差,并在比较原始提示和否定提示的平均得分时显示人类绩效之间的巨大性能差距。通过强调现有LMS和方法的关键局限,我们敦促社区开发开发实际遵循给定指示的LMS的新方法。我们提供代码和数据集,以探索https://github.com/joeljang/negated-prompts-for-llms的否定提示。
translated by 谷歌翻译
我们研究了几个射击语义分割,该语义分割旨在在提供目标类别的一些带注释的支持图像时,旨在从查询图像中分割目标对象。最近的几种方法求助于特征掩蔽技术(FM)技术,以丢弃无关的特征激活,最终促进了分割蒙版的可靠预测。 FM的基本限制是无法保留影响分割面罩准确性的细粒空间细节,尤其是对于小目标对象。在本文中,我们开发了一种简单,有效且有效的方法来增强特征掩蔽(FM)。我们将增强的FM称为杂交遮罩(HM)。具体而言,我们通过研究和利用互补的基本输入掩蔽方法来补偿FM技术中细粒空间细节的损失。已经对三个公共可用的基准测试进行了实验,并具有强烈​​的少量分割(FSS)基准。我们通过在不同基准之间可见的边缘在当前的最新方法中表现出了进步的性能。我们的代码和训练有素的模型可在以下网址找到:https://github.com/moonsh/hm-hybrid-masking
translated by 谷歌翻译
我们提出了一种将多个图像对准和融合到单个视图中的框架,该框架使用神经图像表示(NIRS),也称为基于隐式或基于坐标的神经表示。我们的框架针对突发图像,展示摄像机自我运动和场景中的潜在变化。根据现场运动的性质,我们描述了不同的对齐策略 - 即,透视平面(即,配备),具有最小场景变化的光流,以及带有显着遮挡和脱离的光流。利用神经图像表示,我们的框架有效地将多个输入组合成单个规范视图,而无需选择其中一个图像作为参考帧。我们演示了如何使用此多帧融合框架进行各种图层分离任务。
translated by 谷歌翻译
The 3D-aware image synthesis focuses on conserving spatial consistency besides generating high-resolution images with fine details. Recently, Neural Radiance Field (NeRF) has been introduced for synthesizing novel views with low computational cost and superior performance. While several works investigate a generative NeRF and show remarkable achievement, they cannot handle conditional and continuous feature manipulation in the generation procedure. In this work, we introduce a novel model, called Class-Continuous Conditional Generative NeRF ($\text{C}^{3}$G-NeRF), which can synthesize conditionally manipulated photorealistic 3D-consistent images by projecting conditional features to the generator and the discriminator. The proposed $\text{C}^{3}$G-NeRF is evaluated with three image datasets, AFHQ, CelebA, and Cars. As a result, our model shows strong 3D-consistency with fine details and smooth interpolation in conditional feature manipulation. For instance, $\text{C}^{3}$G-NeRF exhibits a Fr\'echet Inception Distance (FID) of 7.64 in 3D-aware face image synthesis with a $\text{128}^{2}$ resolution. Additionally, we provide FIDs of generated 3D-aware images of each class of the datasets as it is possible to synthesize class-conditional images with $\text{C}^{3}$G-NeRF.
translated by 谷歌翻译
Cellular automata (CA) captivate researchers due to teh emergent, complex individualized behavior that simple global rules of interaction enact. Recent advances in the field have combined CA with convolutional neural networks to achieve self-regenerating images. This new branch of CA is called neural cellular automata [1]. The goal of this project is to use the idea of idea of neural cellular automata to grow prediction machines. We place many different convolutional neural networks in a grid. Each conv net cell outputs a prediction of what the next state will be, and minimizes predictive error. Cells received their neighbors' colors and fitnesses as input. Each cell's fitness score described how accurate its predictions were. Cells could also move to explore their environment and some stochasticity was applied to movement.
translated by 谷歌翻译